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TITLE OF THE INVENTION 
SMAD-INTERACTING POLYPEPTIDES AND THEIR USE 

CROSS-REFERENCE TO RELATED APPLICATIONS 
[0001] This application is a divisional application of [co-pending ]U.S. [patent application 
nol Patent Application No , 09/449,285, filed on 24 November 1999, now U.S. Patent 

[ 1 6,313,280BL which itself claims priority fi-om pending application PCT/EP98/03 193 

filed on 28 May 1998 designating the United States of America, which itself claims priority from 
European Patent Application EP 97201645.5 filed on 2 June 1997. 

REFERENCE TO A "SEQUENCE LISTING" 
The computer readable form of the sequence listing in this application is identical with that 
filed in U. S. patent application no. 09/449,285, filed 24 November 1999. In accordance with 37 
CFR § 1.821(e), please use the last-filed computer readable form filed in that application as the 
computer readable form for the instant application. The paper copy of the instant application is 
identical with the computer readable copy filed for application no. 09/449,285. 

TECHNICAL FIELD 

[0002] The present invention relates to SMAD - interacting polypeptides ("SIP's") such 
as cofactors for SMAD proteins and the use thereof 

BACKGROUND 

[0003] The development from a single cell to a fully organized organism is a complex 
process wherein cell division and differentiation are involved. Certain proteins play a central role in 
this process. These proteins are divided into different families of which the transforming growth 
factor p ("TGF-p") family of ligands, their serine/threonine kinase ("STK") receptors and their 
signalling components are undoubtedly key regulatory polypeptides. Members of the TGF-b 
superfamily have been documented to play crucial roles in early developmental events such as 
mesoderm formation and gastrulation, but also at later stages in processes such as neurogenesis, 
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[0007] In order to clearly demonstrate that SMAD proteins might have a function in 
transcriptional regulation -either directly or indirectly- it is necessary to identify putative [co- 
factorsl cofactors of SMAD proteins[,]^nd response elements in target genes for these SMAD 
proteins and/or [co-fact orsl cofactors . and to investigate the ligand-dependency of these activities. 

[0008] To understand those interactions^ molecular and developmental biology research on 
(i) functional aspects of the ligands, receptors and signaling components (in particular, members of 
the SMAD family)[,] in embryogenesis and disease, (ii) structure-function analysis of the ligands and 
the receptors, (iii) the elucidation of signal transduction, (i v) the identification of cofactors for SMAD 
(related) proteins, and (v) ligand-responsive genes in cultured cell and the Drosophila, amphibian, 
fish and murine embryo are all of utmost importance, 

DISCLOSURE OF THE INVENTION 

[0009] We have found that by carrying out a [two hybridj tv^o-hvbrid screening assay, 
[SMAD interacting] SMAD-interacting protein(s) are obtainable where SMAD C-domain fiised to 
a DNA-binding domain as "bait" and a vertebrate cDNA library as "prey" respectively are used It 
is evident to those of skill in the art that other appropriate cDNA libraries can be used as well. By 
using, for example, SMADl C-domain fused to GAL4 DNA-binding domain and a mouse embryo 
cDNA as bait and prey respectively, a partial SMAD4 and other SMAD-interacting protein (SIP) 
cDNAs, including SIPl, were obtained. 

[0010] Surprisingly, it has been found that at least four [SMAD interactingjSMADi 
interacting proteins thus obtained contain a [DNA bindinel DNA-binding zinc finger domain. One 
of these proteins, SIPl, is a novel member of the family of zinc finger/homeodomain proteins 
containing d-crystallin enhancer binding protein and certain Drosophila zfh-1, the former of which 
has been identified as a DNA-binding repressor. It has been shown that one [DNA bindingl DNA- 
binding domain of SIPl (the C-terminal zinc finger cluster or [SIPl,JSffl^ binds to E2 box 
regulatory sequences and to the Brachyury protein binding site. It has been demonstrated in cells that 
SIPl interferes with E2 box and firacAiywry-mediated transcription activation. SIPl fails to interact 
with fiill-size SMAD in yeast. We have shown for the first time that SMAD proteins can interact with 
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a DNA-binding repressor and, as such, appear to be directly involved in TGF-B ligand-controlled 
repression of target genes which are involved in the strict regulation of normal early development. 
[0011] In summary, characteristics of SIP 1 include the following: 

a) it fails to interact with [fiill size] full-size XSMADl in yeast, 

b) it is a new member of the family of zinc finger/homeodomain proteins including 6-crystallin 
enhancer binding protein and/or Drosophila zfh-1, 

c) rSIPL^I SIPl^ binds to E2 box sites, 

d) [SIPlcrfjSIPlczF binds to the [Brachvurv] ^rac/?vt/rv protein binding site, 

e) it interferes with [Brachyurv-mediated] ^rac/;wrv-mediated transcription activation in cells, and 

f) it interacts with the C-domain of SMAD 1, 2 and/or 5. 

[0012] As used herein, "E2 box sites" means a -CACCTG- regulatory conserved 
nucleotide sequence which contains the binding site C ACCT for 5-crystallin enhancer binding proteins 
as described in Sekido et al, 1996, Gene, 173, p.227-232. These E2 box sites are known targets for 
important basic helix-loop-helix (bHLH) factors such as MyoD , a transcription factor in 
embryogenesis and myogenesis. 

[0013] So, the SIPl according to the invention (a zinc finger/homeodomain protein) binds 
to specific sites in the promoter region of a number of genes which are relevant for the immune 
response and early embryogenesis and as such may be involved in transcriptional regulation of 
important differentiation genes in significant biological processes such as cell growth and 
differentiation, embryogenesis, and abnormal cell growth including cancer, 

[0014] The invention also includes an isolated nucleic acid sequence including the nucleotide 
sequence as provided in SEQ ID NO: 1 coding for [a SMAD interacting] an SMAD-interacting 
protein or a functional fragment thereof 

[0015] Furthermore, a recombinant expression vector including the isolated nucleic acid 
sequence (in sense or anti-sense orientation) operably linked to a suitable control sequence belongs 
to the present invention and cells transfected or transduced with a recombinant expression vector as 
well. 

[0016] Another aspect of the invention is a polypeptide including the amino acid sequence 
according to SEQ ID NO: 2 or a fimctional fragment thereof The present invention also includes 
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variants or homologues of amino acids enclosed in the disclosed polypeptides wherein the amino acids 
are modified and/or substituted by other amino acids obvious for a person skilled in the art. For 
example, post-expression modifications of the polypeptide such as phosphorylations are not excluded 
fi*om the scope of the current invention, 

[0017] A pharmaceutical composition including the previously identified nucleic acid(s) or 
a pharmaceutical composition including the polypeptide(s) are another aspect of the invention. The 
nucleic acid and/or polypeptide according to the invention can be optionally used for appropriate gene 
therapy purposes. 

[0018] In addition, a method for diagnosing, prognosis and/or follow-up of a disease or 
disorder by using the nucleic acid(s) according to the invention or by using the polypeptide(s) also 
form an important aspect of the current invention. Furthermore, in the method for diagnosing, 
prognosis and/or follow-up of a disease or disorder^ an antibody, directed against a polypeptide or 
fragment thereof according to the current invention, can also be conveniently used. As used herein, 
the term "antibody" refers, without limitation, to preferably purified polyclonal antibodies or 
monoclonal antibodies, altered antibodies, univalent antibodies. Fab proteins, single domain antibodies 
or chimeric antibodies. In many cases, the binding rphenomena] phenomenon of antibodies to antigens 
is equivalent to other ligand/anti-ligand binding, 

[0019] A diagnostic kit including a nucleic acid(s) sequence and/or a polypeptide(s) or 
antibodies directed against the polypeptide or fragment thereof according to the invention for 
performing previously identified [method]methods for diagnosing a disease or disorder clearly belong 
to the invention as well. 

[0020] Diseases or disorders in this respect are, for instance, related to cancer, 
malformation, immune or neural diseases, or bone [metabolism relatedl metabolism-related diseases 
or disorders. In addition^ a disease affecting organs like skin, lung, kidney, pancreas, stomach, gonad, 
muscle or intestine can be diagnosed as well using the diagnostic kit according to the invention. 

[0021] Using the nucleic acid sequences of the invention as a basis, oligomers of 
approximately 8 nucleotides or more can be prepared, either by excision or synthetically, which 
hybridize^ for instance^ with a sequence coding for SIP or a fianctional part thereof and are thus 
usefial in identification of SIP in diseased individuals. These so-called "probes" are of a length which 
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allows the detection of unique sequences of the compound to detect or determine by hybridization 
as defined above. While 6-8 nucleotides may be a workable length, sequences of about 10-12 
nucleotides are preferred, and about 20 nucleotides appears optimal. The nucleotide sequence may 
be labelled^ for example^ with a radioactive compound, biotin, enzyme, dye stuff or metal sol, 
fluorescent or chemiluminescent compound. The probes can be packaged into diagnostic kits. 
Diagnostic kits include the probe nucleotide sequence, which may be labeled; alternatively, the probe 
may be unlabeled and the ingredients for labelling may be included in the kit in separate containers 
so that the probe can optionally be labeled. The kit may also contain other suitably packaged reagents 
and materials needed for the particular hybridization protocol, for example, standards, wash buffers, 
as well as instructions for conducting the test. 

[0022] The diagnostic kit may include an antibody directed to a polypeptide or fragment 
thereof according to the invention in order to set up an immunoassay. Design of the immunoassay 
is subject to a great deal of variation, and the variety of these are known in the art. Immunoassays 
may be based, for example, upon competition, or direct reaction, or sandwich type assays. 

[0023] An important aspect of the present invention is the development of a method of 
screening for compounds (chemically synthesized or available from natural sources) which affect the 
interaction between SMAD and SIP's having the current knowledge of the [SMAD 
interacting] SMAD-interacting polypeptides ([so called] so-called SIP's such as SIPl or SIP2 as 
specifically disclosed herein). 

[0024] A transgenic animal [harbouringjharboring the nucleic acid(s) according to the 
invention in its genome also [belongjbelongs to the scope of this invention. The transgenic animal 
can be used for testing medicaments and therapy models as well. As used herein, a transgenic animal 
means a non-human animal which has incorporated a foreign gene (called transgene) into its genome. 
Because this gene is present in germ line tissues, it is passed from parent to offsprings establishing 
lines of transgenic animals from a first founder animal. As such, transgenic animals are recognized 
as specific species variants or strains, following the introduction and integration of new gene(s) into 
their genome. The term "transgenic" has been extended to chimeric or "knockout" animals in which 
gene(s), or part of genes, have been selectively disrupted or removed from the host genome. 
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d) it has an [RNase]RNAse activity, and 

e) it interacts with the C-domain of SMADl, 2 and/or 5. 

[0033] The invention also includes a method for post-transcriptional regulation of gene 
expression by members of the TGF-b superfamily by manipulation or modulation of the interaction 
between SMAD function and/or activity and mRNA stability. 

BRIEF DESCRIPTION OF THE FIGURE 
[0034] FIG. 1 shows that the XSMADl C-domain interacts with SIPl in mammalian cells 
and deletion of the 51 amino acid ("aa") long SBD (SMAD binding domain) in SIPl abolishes the 
interaction. COS-1 cells were transiently transfected with expression constructs encoding N- 
terminally myc-tagged SIPl and a GST-XSMADl C-domain fusion protein. The latter was purified 
from cell extracts using [gluthatione-sepharosel glutathione- sepharose beads. Purified proteins were 
visualized after SDS-PAGE and Western blotting using anti-GST antibody (Pharmacia), (Panel A, 
slim arrow). Myc-tagged SIPl protein was co-purified with GST-XSMADl C-domain fusion 
protein, as was shown by Western blotting of the same material using anti-myc monoclonal antibody 
(Santa Cruz)(Panel C, lane one, thick arrow). Deletion of the 5 1 aa long SBD in SIPl abolished this 
interaction (panel C, lane 2). Note that the amounts of purified GST-XSMADl C-domain protein 
and levels of expression of both SIP 1 (v^ld type and SIP 1 del SBD) proteins in total cell extracts were 
comparable (compare lanes 1 and 2 in panel A and B). 

DETAILED DESCRIPTION OF THE INVENTION 
[0035] A [two hybridl two-hvbrid screening assay for use v^th the invention may be 
performed as generally described by Chien et al., PNAS, 88, p.9578-9582[.] (1991). 

[0036] The polypeptide or fi-agments thereof included within the invention are not 
necessarily translated from the nucleic acid sequence according to the invention but may be generated 
in any manner, including, for example, chemical synthesis or expression in a recombinant expression 
system. Generally, "polypeptide" refers to a polymer of amino acids, and does not refer to a specific 
length of the molecule. Thus, linear peptides, cyclic or branched peptides, peptides with non-natural 
or non-standard amino acids such as D-amino acids, ornithine and the like, oligopeptides and proteins 
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are all included within the definition of polypeptide. The terms "protein" and "polypeptide", as used 
herein, are generally interchangeable. "Polypeptide" as previously mentioned refers to a polymer of 
amino acids (amino acid sequence) and does not refer to a specific length of the molecule. Thus, 
peptides and oligopeptides are included within the definition of polypeptide. This term also includes 
post-translational modifications of the polypeptide, for example, glycosylations, acetylations, 
phosphorylations and the like. Included within the definition are, for example, polypeptides 
containing one or more analogs of an amino acid (including, for example, unnatural amino acids, etc.), 
polypeptides with substituted linkages, as well as other modifications known in the art, both naturally 
occurring and non-naturally occurring. 

[0037] "Control sequence", as used herein, refers to regulatory DNA sequences which are 
necessary to affect the expression of coding sequences to which they are ligated. The nature of such 
control sequences differs depending upon the host organism. In prokaryotes, control sequences 
generally include a^promoter, ribosomal binding site, and terminators. In eukaryotes, generally 
control sequences include promoters, terminators and, in some instances, enhancers, transactivators, 
transcription factors or 5' and 3' untranslated cDNA sequences. The term "control sequence" is 
intended to include, at a minimum, all components^ the presence of which are necessary for 
expression, and may also include additional advantageous components. 

[0038] "Operably linked", as used herein, refers to a juxtaposition wherein the components 
so described are in a relationship permitting them to Sanction in their intended manner, A control 
sequence "operably linked" to a coding sequence is ligated in such a way that expression of the 
coding sequence is achieved under conditions compatible with the control sequences. In the case 
where the control sequence is a promoter, it would be obvious to a skilled person to use double- 
stranded nucleic acid. 

[0039] As used herein, "fi-agment of a sequence" or "part of a sequence" means a truncated 
sequence of the original sequence referred to. The truncated sequence (nucleic acid or protein 
sequence) can vary widely in length; the minimum size being a sequence of sufficient size to provide 
a sequence with at least a comparable function and/or activity of the original sequence referred to, 
while the maximum size is not critical. In some applications, the maximum size usually is not 
substantially greater than that required to provide the desired activity and/or function(s) of the 
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original sequence. Typically, the truncated amino acid sequence will range from about 5 to about 60 
amino acids in length. More typically, however, the sequence will be a maximum of about 50 amino 
acids in length, preferably a maximum of about 30 amino acids. It is usually desirable to select 
sequences of at least about 10, 12 or 15 amino acids, up to a maximum of about 20 or 25 amino 
acids. 

[0040] Furthermore, the current invention is not limited to the exact isolated nucleic acid 
sequences specifically identified herein, including the nucleotide sequence as mentioned in SEQ ED 
NO: 1, but also a nucleic acid sequence hybridizing to the nucleotide sequence as provided in SEQ 
ID NO: 1 or a functional part thereof and encoding [a SMAD interacting] an SMAD-interacting 
protein or a fimctional fragment thereof belongs to the present invention. 

[0041] To clarify, as used herein, "hybridization" means conventional hybridization 
conditions known to the skilled person, preferably appropriate stringent hybridization conditions. 
Hybridization techniques for determining the complementarity of nucleic acid sequences are known 
in the art. The stringency of hybridization is determined by a number of factors during hybridization 
including temperature, ionic strength, length of time and composition of the hybridization buffer. 
These factors are outlined in, for example, Maniatis et al (19S2) Molecular Cloning; A laboratory 
manual (Cold Spring Harbor Press, Cold Spring Harbor, N. Y.). 

[0042] The term "antigen" refers to a polypeptide or group of peptides which include at 
least one epitope. "Epitope" refers to an antibody binding site usually defined by a polypeptide 
including 3 amino acids in a spatial conformation which is unique to the epitope[,]^ generally an 
epitope consists of at least 5 such amino acids and more usually of at least 8-10 such amino acids. 

[0043] The invention is further explained by the following illustrative EXAMPLES: 

EXAMPLES 
Example I 

Yeast, two-hybrid cloning of SMAD-interacting proteins 

[0044] In order to identify cofactors for SMADl, a two-hybrid screening in yeast was 
carried out using the A^SMADl C-domain fused to GAL4 DNA-binding domain (GAL4dbd) as bait, 
and a cDNA library from mouse embryo (12.5 dpc) as a source of candidate preys. The GAiL%^^- 
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SMADl bait protein failed to induce in the reporter yeast strain GAL4-dependent HISS and LacZ 
transcription on its own or in conjunction with an empty prey plasmid. Screening of 4 million yeast 
transformants identified about 500 colonies expressing HIS3 and LacZ. The colonies displaying a 
phenotype^ which was dependent on expression of both the prey and the bait cDNAs, were then 
characterized. Plasmids were rescued and the prey cDNAs sequenced (SEQ ID NOS: 1-20 of the 
Sequence Listing enclosed; for each nucleic acid sequence only one strand is depicted in the Listing). 
Four of these (thl, thl2, th76 and th74 respectively also denominated in this application as SEPl, 
SIP2, SIPS and SIP7 respectively) are disclosed in detail (embedded in SEQ ID NOS: 1, 2, 3, 4, 10, 
and 8 respectively). One (th72= combined SEQ ID NO: 6 and 7) encodes a protein in which the 
GAL4 transactivation domain (GAL4tad) is fused in-fi*ame to a partial SMAD4 cDNA, which starts 
at amino acid (aa) 252 in the proline-rich domain. SMAD4 has been shown to interact with other 
SMAD proteins, but no SMAD has been picked-up thus far in a two-hybrid screen in yeast, using the 
C-domain of another SMAD as bait. These data suggest that the N-domain of both interacting 
SMAD proteins, as well as part of (SMAD4) or the entire (SMADl) proline-rich domain, is 
dispensable for heterodimeric interaction between SMAD proteins, at least when using a two-hybrid 
assay in yeast. 

[0045] The cDNA insert of the second positive prey plasmid, thl (embedded in SEQ ID 
NO: 1), encodes a protein in which the GAL4T.^-coding sequence is fiased in-fi-ame to about a 1.9 
kb-long thl cDNA, which encodes a polypeptide SIPl (Thl) of 626 aa. [Data basel Database 
searches revealed that SIPl (Thl) contained a homeodomain-like segment, and represents a novel 
member of a family of DNA-binding proteins including vertebrate d-crystallin enhancer binding 
proteins (d-EFl) and Drosophila zfh-1. These zinc finger/ homeodomain-containing transcription 
factors are involved in organogenesis in mesodermal tissues and/or development of the nervous 
system. The protein encoded by thl cDNA is [a SMAD interacting! an SMAD-interacting protein 
(SIP) and was named SIPl (THl). 



12 




Example II 

SIPl 

Characterization of SIPl-SMAD interaction in yeast and in vitro 

[0046] The binding of SIPl (THl) to full-size ^MADl and modified C-domains was 
tested. The latter have either an amino acid substitution (G418S) or a deletion of the last 43 aa 
(D424-466). The first renders the SMAD homolog in Drosophila Mad inactive and abolishes BMP- 
dependent phosphorylation of SMADl in mammalian cells. A truncated Mad, similar to mutant 
D424-466, causes mutant phenotypes in Drosophila, while a similar truncation in SMAD4 (dpc-4) 
in a loss-of-heterozygosity background is associated with pancreatic carcinomas. SIP 1 (TH 1 ) [does] 
neither [interactjinteracts with fiall-size ^SMAD1[,] no n interacts with mutant D424-466. The 
absence of any detectable association of fiiU-size^SMADl was not due to ineflBcient expression of 
the latter in yeast, since one other SMAD-interacting prey (thl2) efficiently interacted with the full- 
length SMAD bait. Lack of association of SIPl (THl) with full-size JffiMADl in yeast follows 
previous suggestions that the activity of the SMAD C-domain is repressed by the N-domain, and that 
this repression is eliminated in mammalian cells by incoming BMP signals. The G4 1 8S mutation in 
the C-domain of SMAD 1 does not abolish interaction with SIPl, suggesting that this mutation 
affects another aspect of SMADl function. The ability of the full-size G418S SMAD protein to 
become fiinctional by activated receptor STK activity may thus be affected, but not the ability of the 
G418S C-domain to interact with downstream targets. This indicates that activation of SMAD is a 
prerequisite for and precedes interaction with targets such as SIP 1 . The deletion in mutant D424-466 
includes three conserved and functionally important serines at the C-terminus of SMAD which are 
direct targets for phosphorylation by the activated type I STK receptor. 

[0047] The C-domains of SMADl and SMAD2 induce ventral or dorsal mesoderm, 
respectively, when over-expressed individually in Xenopus embryos, despite their very high degree 
of sequence conservation. Very recently, SMAD5 has been shown to induce ventral fates in the 
Xenopus embryo. To investigate whether the striking differences in biological activity of SMADl, 
-5 and SMAD2 could be due to distinct interactions with cofactors, the ability of SIPl (THl) protein 
to interact with the C-domains of SMADl, -5 and SMAD2 in a yeast two-hybrid assay was tested. 
SIPl (THl) was found to interact in yeast with the C-domain of all three SMAD members. Then the 
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interaction of SIP 1 with different SMAD C-domains in vitro was investigated, using glutathione- S- 
transferase ("GST") pull-down assays. GST-SMAD fusion proteins were produced in E. Coli and 
coupled to rglutathione-Sepharosel glutathione-sepharose beads. An unrelated GST fusion protein 
and unfused GST were used as negative controls. Radio-labeled, epitope-tagged SIPl protein was 
successfully produced in mammalian cells using a vaccinia virus (T7 W)-based system. Using GST- 
SMAD beads, this SIPl protein was pulled down from cell lysates, and its identity was confirmed by 
Western blotting. Again, as in yeast, it was found that SIP 1 is a common binding protein for different 
SMAD C-domains, suggesting that SIPl might mediate common responses of cells to different 
members of the TGF-B superfamily. Alternatively, SMAD proteins may have different affinities for 
SLPl in vivo, or other mechanisms might determine the specificity, if any, of SMAD-SIPl interaction. 



Example III 

SIPl is a new member of zinc finger/homeodomain proteins of the rdEF-l] d-EFl family 

[0048] Additional SIPl open reading frame sequences were obtained by a combination of 
cDNA library screening with 5 'RACE-PCR. The screening yielded a 3 . 2 kb-long SIP 1 cDNA (tw6), 
which overlaps partially with thl cDNA. The open reading frame of SIPl protein encodes 944 aa 
(SEQ ID NO: 2), and showed homology to certain regions in d-EFl, ZEB, AREB6, BZP and zfh-1 
proteins, and strikingly similar [orRanisationl organization of putative fimctional domains. Like these 
proteins, SIP 1 contains two zinc finger clusters separated by a homeodomain and a glutamic acid-rich 
domain. Detailed comparisons reveal that SIPl is a novel and divergent member of the two-handed 
zinc finger/homeodomain proteins. As in d-EFl , three of the five residues that are conserved in helix 
3 and 4 of all canonical homeodomains are not present in SIPl. SIPl (Thl) which contains the 
homeodomain but lacks the C-terminal zinc finger cluster and glutamic acid-rich sequence, interacts 
with SMAD. This interaction is maintained upon removal of the homeodomain-like domain, 
indicating that a segment encoding aa 44-236 of SIPl (numbering according to SEQ ED NO: 2) is 
sufficient for interaction with SMAD. To narrow this domain further down, progressive deletion 
mutants, starting from the N-terminus, as well as the C-terminus of this 193 aa region were made. 
Progressive 20 aa deletion constructs were generated by PGR. Two restriction sites (5' end Smal 
site, 3' end Xhol site) were built in to allow cloning of amplified sequences in the yeast [two 
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hvbridl two-hvbrid bait vector pACT2 (Clontech), An extensive [two hvbridl two-hvbrid experiment 
was performed with these so-called SBD mutant constructs as a prey and the JfSMADl C-domain 
as bait. The mutant SBD constructs that encoded aa 166-236 (of SEQ ID NO: 2) or aa 44-216 were 
still able to interact with the bait plasmid, whereas mutant constructs encoding aa 186-236 or aa 44- 
196 could not interact with the bait. In this way, the smallest domain that still interacts with the 
XSMADl C-domain was defined as a 51 aa domain encompassing aa 166-216 of SEQ ED NO: 2. 

[0049] The amino acid sequence of the SBD, necessary for the interaction with SMAD, thus 
is (depicted in one-letter code): 

QHLGVGMEAPLLGFPTMNSNLSEVQKVLQIVDNT VSRQKMDCKTEDISKLK (SEQ ID NO: 
21). 

[0050] Deletion of an additional 20 aa at the [N-orl N- or C-terminal end of this region 
disrupted the SMAD binding activity. Subsequently, this 51aa region was deleted in the context of 
SIPl protein, again using a [PCR basedl PCR-based approach, generating an Ncol restriction site at 
the position of the deletion. This SIP1DSBD51 was not able to interact with the SMAD C-domain 
any longer, as assayed by a "mammalian [pull down] pull-dovyn assay". In these experiments, SIPl, 
myc-tagged at its N-terminal end^ was expressed in COS-1 cells together v^th a GST-XSMADl C- 
domain fusion protein. Myc-SIP 1 protein was co-purified fi-om cell extracts with the GST-A1SMAD 1 
C-domain fiision protein using [gluthatione-sepharosel glutathione-sepharose beads, as was 
demonstrated by Western blotting using anti-myc antibody. Deletion of the 5 1 aa in SIP 1 abolished 
the interaction, as detected in this assay, with theXSMADl C-domain. {See, FIG. 1). 

Example IV 

Analysis of the DNA-binding activity of the C-terminal zinc finger cluster of SIPl . 

[0051] d-EFl is a repressor that [regulatel regulates the enhancer activity of certain genes. 
This repressor binds to the E2 box sequence (5'-CACCTG) which is also a binding site for a 
subgroup of basic helix-loop-helix (bHLH) activators (Sekido et al., 1994, MoLCelLBioL, 14, p. 
5692-5700). Interestingly, the CACCT sequence which has been shown to bind d-EFl is also part 
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of the consensus binding site for Bra protein. It has been proposed that cell type-specific gene 
expression is accomplished by competitive binding to CACCT sequences between repressors and 
activators. 5-EFl mediated repression could be the primary mechanism for silencing the IgH 
enhancer in non-B cells. d-EF 1 is also present in B-cells, but is counteracted by E2 A, a bHLH factor 
specific for B-cells. Similarly, d-EFl represses the Igk enhancer where it competes for binding with 
bHLH factor E47, 

[0052] The C-terminal zinc finger cluster of [dEF- 1 ] d-EFl is responsible for binding to E2 
box sequences and for competition with activators. Considering the high sequence similarities in this 
region between SIPl and d-EFl, it was decided to test first whether both proteins have similar DNA 
binding specificities, using gel retardation assays. Therefore, the DNA-binding properties of the C- 
terminal zinc finger cluster of SIPl (named SIPIczf) was analyzed. SDPIczf was efficiently produced 
in and purified fi-om E, coli as a short GST fusion protein. Larger GST-SIP 1 fusion proteins were 
subject to proteolytic degradation in E, coli , 

[0053] Purified GST-SIPIczf was shown to bind to the E2 box of the IgH kE2 enhancer. 
A mutation of this site (Mutl), which was shown previously to affect the binding of the bHLH factor 
E47 but not d-EFl, did not affect binding of SIPIczf- Two other mutations in this kE2 site (Mut2 
and Mut4, respectively) have been shown to abolish binding of d-EFl (Sekido \et a/..] et al.. 1994) 
and did so in the case of SIPIczf- In addition, also the binding of SIPIczf to the Nil-2A binding site 
of the interieukin-2 promoter, the Bra protein binding site and the AREB6 binding site were 
demonstrated. The specificity of the binding of SIPIczf to the Bra binding site was further 
demonstrated in competition experiments. Binding of SIPIczf to this site was competed by excess 
unlabeled Bra binding site probe, while kE2 wild type probe competes, albeit less efficiently than its 
variant Mutl, which is a very strong competitor, kE2-Mut2 and kE2-Mut4 failed to compete, as did 
the GATA-2 probe, while the AREB6 site competed very eflRciently. From these experiments, it can 
be concluded that GST-SIPIczf fusion protein displays the same DNA binding specificity as other 
GST fusion proteins made with the CZF region of d-EFl and related proteins (Sekido \etalA ti al.. 
1994). In addition, it was demonstrated for the first time that SIPl binds specifically to regulatory 
sequences that are also target sites for Bra. This may be the case for the other d-EF 1 -related proteins 
as well and these may interfere with Bra-dependent gene activation in vivo. 
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[0054] Analyses were done to sites recognized by the bHLH factor MyoD. MyoD has been 
shown to activate transcription from the muscle creatine kinase ("MCK") promoter by binding to E2 
box sequences (Weintraub et al., 1994, Genes Dev.^S, p.2203-2211; Katagiri \et a/.. l et al., 1997, 
Exp,CellRes. 230, p, 342-351). Interestingly, d-EFl has also been demonstrated to repress MyoD- 
dependent activation of the MCK enhancer, as well as myogenesis in \OTV2 cells, and this is thought 
to involve E2 boxes (Sekido [e/a/.JetaL 1994). In addition, TGF-fl and BMP-2 have been reported 
to down-regulate the activity of muscle-specific promoters, and this inhibitory effect is mediated by 
E2 boxes (Katagiri [et a/., l et al., 1997). The latter are present in the regulatory regions of many 
muscle-specific genes, are required for muscle-specific expression, and are optimally recognized by 
heterodimers between myogenic bHLH proteins (of the MyoD family) and of widely expressed 
factors like E47. SIPIczf was able to bind to a probe that encompasses the MCK enhancer E2 box 
and this complex was competed by the E2 box oligonucleotide and by other SIPl binding sites. In 
addition, a point mutation within this E2 box that is similar to the previously used kE2-Mut4 site also 
abolished binding of FSIPL^I SIPl^ . These results confirm that [SIP L^I SIPl^ binds to the E2 box 
of the MCK promoter. SIP 1 , as SMAD-interacting and MCK E2 box binding protein, may therefore 
represent the factor that mediates the TGF-13 and BMP repression of the MyoD-regulated MCK 
promoter (Katagiri [et a/.. l et al.. 1997). 



Example V 

SIPl is a BMP-dependent repressor of Bra activator 

[0055] The experiments have demonstrated that SIP Iczf binds to the Bra protein binding 
site, to.IL-2 promoter, and to E2 boxes, the latter being implicated in BMP or TGF-l3-mediated 
repression of muscle-specific genes. These observations prompted therefore to test whether SIPl 
(as SIPlTwe) is a BMP-regulated repressor. A reporter plasmid containing [a]an SEPl binding site 
( the Bra protein binding site) fused to the luciferase gene was constructed. COS cells, maintained 
in low serum (0,2%) medium during the transfection, were used in subsequent transient transfection 
experiments since they have been documented to express BMP receptors and support signalling 
(Hoodless [e/a/.,]etaL 1996,Cell, 85, p.489-500). It was found in the experiment that SlPl^we is 
not able to change the transactivation activity of Bra protein via the Bra binding site. In addition, no 
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transactivation of this reporter plasmid by SlPlj^e could be detected in the presence of 10% or 0.2% 
serum, and in the absence of Bra expression vector, 

[0056] Therefore, identical experiments were carried out in which the cells were exposed 
toBMP-4. SIP1t;j^6 repressed the Bra-mediated activation of the reporter. It does this in a dose- 
dependent fashion (amount of SlPlj^e plasmid, concentration of BMP-4). Total repression has not 
been obtained in this type of experiment, because the transfected COS cells were exposed only after 
24 hours to BMP-4, Consequently, luciferase mRNA and protein accumulate during the first 24 
hours of the experiment as the result of rBrachvurv] ^rac/?v^rv activity. The conclusion fi*om these 
experiments clearly shows that SIPl is a repressor of Bra activator, and its activity as repressor is 
detected only in the presence of BMP. It is important that SIP 1 has not been found to be an activator 
of transcription via Bra target sites. This is interesting, since the presence in d-EFl-like proteins of 
a polyglutamic acid-rich stretch (which is also present in SEPlywe used here) has led previously to the 
speculation that these repressors might act as transcriptional activators as well In particular, AREB6 
has been shown to bind to the promoter of the housekeeping gene Na,K- ATPase a-1 and to repress 
gene expression dependent on cell type and on the context of the binding site (Watanabe [et a/.,]et 
aL 1993, J. Biochem.MA, p. 849-855). 

Example VI 

SIPl mRNA expression in mice 

[0057] Northern analysis demonstrated the presence of a major SIPl 6 kb mRNA in the 
embryo and several tissues of adult mice, with very weak expression in liver and testis. A minor 9 
kb-long transcript is also detected, which is^ however^ present in the 7 dpc embryo. In situ 
hybridization documented SIPl transcription in the 7.5 dpc embryo in the extra-embryonic and 
embryonic mesoderm. The gene is weakly expressed in embryonic ectoderm. In the 8,5 dpc embryo, 
very strong expression is seen in extra-embryonic mesoderm (blood islands), neuroepithelium and 
neural tube, the first and second branchial arches, the optic eminence, and predominantly posterior 
presomitic mesoderm. Weaker but significant expression is detected in somites and notochord. 
Between day 8.5 and 9.5, this pattern extends clearly to the trigeminal and facio-acoustic neural crest 
tissue. Around mid-gestation, the SIPl gene is expressed in the dorsal root ganglia, spinal cord, 
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a) SIP2 [full lengthl full-length sequence 

[0060] Thl2 showed high homology to a partial cDNA (KIAA0150) isolated from the 
human rmyoloblastl myoblast cell line KGl . However, this human cDNA is +/- 2 kb longer at the 3 ' 
end of thl2. Using this human cDNA, an EST library was screened and mouse EST were detected 
homologous to the 3 'end of KIAA0150 cDNA. Primers were designed based on thl2 sequence and 
the mouse EST found to amplify a cDNA that contains the stop codon at the 3 'end. 5' sequences 
encompassing the start codon was obtained using 5'RACE-PCR . 

[0061] Gene bank accession numbers for the mentioned EST clones used to complete the 
SIP2 open reading frame: Human KIAA0150 ; D63484 and Mouse EST sequence; Soares mouse 
p3NMF19.5;W82188. 

[0062] Primers used to reconstitute SIP2 open reading frame: 
based on thl2 sequence: F3thl2F (forward primer) 5'-cggcggcagatacgcctcctgca (SEQ ID NO: 22) 
based on EST sequence: thl2mousel (reverse primer) 5'-caggagcagttgtgggtagagccttcatc (SEQ ID 
NO: 23), 

[0063] Primers used for 5 '-race; all are reverse primers derived from thl2 sequence 
1: 5'-ctggactgagctggacctgtctctccagtac (SEQ ID NO: 24) 
2 : 5'-cacaagggagtatttcttgcgccacgaagg (SEQ ID NO: 25) 
3: 5'-gccatggtgtgaggagaagc (SEQ ID NO: 26) 

[0064] The [fiiU sizel fuU-size SIP2 deduced from the assembly of these sequences contains 
950 amino acids as depicted in SEQ ID NO. 4, while the nucleotide sequence is depicted in SEQ ID 
NO. 3. 

b) SIP2 sequence homologies 

[0065] SIP2 contains a domain encompassing 5 [CCCH tvpel CCCH-tvpe zinc fingers. This 
domain was found in other protein such as Clipper in Drosophila, No Arches in Zebrafish and CPSF 
in mammals. No Arches is essential for development of the branchial arches in Zebrafish and CPSF 
is involved in transcription termination and polyadenylation. The domain containing the 5 CCCH in 
Clipper was shown to have an [EndoRNase]EndoRNAse activity (see below). 
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c) SIP2 CCCH domain has an RNAse activity 

[0066] The domain containing the 5 CCCH -type zinc fingers of SIP2 was fused to GST 
and the fusion protein was purified fi-om colL This fusion protein displays [a]an RNAse activity 
when incubated with labelled RNA produced in vitro. In addition, it has been shown that this fiision 
protein was able to bind [single stranded] single-stranded DNA. 

[0067] In more detail, GST fusion proteins of SIP2 SxCCCH; PLAGl (an unrelated zinc 
finger protein), SIPIczf (C-terminal zinc finger cluster of SIPl) and thl (SIPl partial polypeptide 
isolated in the two-hybrid screening), and th^cytoplasmatic tail of CD40 were produced in E. coli 
and purified using [glutathione sepharosel glutathione-seoharose beads. Three ^^S labelled substrates, 
previously used to demonstrate the RNAse activity of Clipper, a related protein from Drosophila 
(Bai, C. and Tolias P.P. 1996, cleavage of RNA Hairpins Mediated by a Developmentally Regulated 
CCCH Zinc Finger Protein. Mol_ Cell[.] BioL 16: 6661-6667) were produced by in vitro 
transcription. The RNA cleavage reactions with purified GST fusion proteins were performed in the 
presence of RNAsin (blocking RNAseA activity). Equal aliquots of each reaction were taken out at 
time points \\ 1\ 15', 30', 60'. Degradation products were separated on a denaturing 
polyacrylamide gel and visualized by autoradiography. These experiments demonstrated that GST- 
SIP2 SXCCCH has an RNAse activity and degrades all tested substrates, while GST-PL AGl , GST- 
CD40, GST-SIPlczF and GST-thl do not have this activity. 

[0068] Interaction between thl2 (partial SIP2 polypeptide) and SMAD C-domains in GST 
[pull down] pull-down experiments. 

[0069] C-domains of Xenopus flY^SMADl and mouse SMAD2 and 5 were produced mE. 
coli as fusion proteins with [gluthatione S-transferasel glutathione-S-transferase and coupled to 
fgiuthationel glutathione beads. An unrelated GST-fiision protein (GST-CD40 cytoplasmatic mail) 
and GST itself were used as negative controls. 

[0070] Thl 2 protein, provided with an HA-tag at its N-terminal end, was produced in Hela 
cells using the T7 vaccinia virus expression system and metabolically labelled. Expression of 
[Thl2]thl2 was confirmed by immune precipitation with HA antibody, followed by SDS-page and 
autoradiography. Thl 2 protein is produced as a ± 50 kd protein. Cell extracts prepared fi-om Hela 
cells expressing this protein were mixed with GST-SMAD C-domain beads in GST [pull downlpuU- 
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down buffer and incubated overnight at 4'' C. The beads were then washed four times in the same 
buffer, the bound proteins eluted in Laemmli sample buffer and separated by SDS-PAGE. "Pulled 
down" thl2 protein was visualized by Western blotting , using HA antibody. These experiments 
demonstrate that thl2 is efficiently pulled down by GST-SMAD C-domain beads, and not by GST- 
CD40 or GST alone. 

Conclusion on SIP2 

[0071] SIP2 is a [SMAD interactingi SMAD-interacting protein that contains [a]an RNAse 
activity. The finding that SMADs interact with potential RNAses provides an unexpected link 
between the TGF-b signal transduction and mRNA Fstabilisationl stabilization . 

Example VIII 

SIPS 

Characterization of SIPS 

[0072] One contiguous open reading frame is fiised in frame to the GAL4 transactivating 
domain in the [two hvbridl two-hvbrid vector pACT-2 (Clontech). This represents a partial cDNA, 
since no [in framel in-frame translational stop codon is present. The sequence has no significant 
homology to anything in the database, but displays a region of high homology with the following 
EST clones: 

[0073] Mouse: accession numbers: AA212269 (Stratagene mouse melanoma); AA215020 
(Stratagene mouse melanoma), AA794832 (Knowles Solter mouse 2 c) and Human: accession 
numbers AA830033, AA827054, AA68727S, AAS0S14S, AA371063. 

[0074] Analysis of interaction of the SBPS prey protein with different bait proteins (which 
are described in the data section obtained with SIP 1) in a yeast [two hvbridl two-hvbrid assay is as 
follows: 

Empty bait vector pGBT9 

[Full lengthl FuU-length XSMAD 1 + 
XSMADl C-domain + 
XSMAD 1 C-domain with G418S substitution + 
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Mouse SMAD2 C-domain + 
Mouse SMAD5 C-domain + 
Lamin (pLAM; Clontech) 

[0075] SIPS partial protein encoded by [above describedl the above-described cDNA also 
interacts with XSMADl, mouse SMAD2 and 5 C-domains [in vitro] />7 vitro as [analvsed] analvzed 
by the GST [pull down] pull-down assay (previously described for SIP 1 and SIP2). Briefly, the partial 
SIPS protein was tagged with a myc tag at its C-terminal end and expressed in COS-1 cells. GST- 
SMAD C-domain fusion proteins, GST-CD40 cytoplasmatic tail and GST alone were expressed in 
E, coll and coupled to [glutathione sepharose] glutathione-sepharose beads. These beads were 
subsequently used to pull down partial SIPS protein from COS cell lysates, as was demonstrated after 
SDS-P AGE of pulled down proteins followed by Western blotting using [anti mycl anti-mvc antibody. 
In this assay, SIPS was pulled down by GST-XSMADl, 2 and S C-domains, but not by 
rGSTalonel GST alone or GST-CD40. A partial, but coding, nucleic acid sequence for SIPS is 
depicted in SEQ ID NO: 10. 

Example IX 

SIP7 (Characterization of SIP7) 

[0076] One contiguous open reading frame is fiised in frame to the GAL4 transactivating 
domain in the [two hvbridl two-hvbrid vector pACT2. This is a partial clone, since no [in framelitv 
frame translational stop codon is present. Part of this clone shows homology to Wnt-7b,_accession 
number: M89802, but the clone seems to be a novel cDNA or a cloning artefact. The homology of 
the SIP7 cDNA with the known Wnt7-b cDNA starts at nucleotide 390 and extends to nucleotide 
846. This corresponds to the nucleotides 74-S30 in Wnt7-b coding sequences (with A of the 
translational start codon considered as nucleotide nr 1), In SIP7 cDNA, this region of homology is 
preceded by a sequence that shows no homology to anything in the database. It is not clear whether 
the SIP7 cDNA is^ for example^ a new Wnt7-b transcript or whether it is a scrambled clone as a result 
of the fusion of two cDNAs during generation of the cDNA library. 

[0077] Analysis of the interaction of the SIP7 prey protein with different bait proteins in a 
yeast [two hvbridl two-hvbrid assay can be summarized as follows: 
PGBT9 
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[Full length]Fuiyength XSMAD 1 
XSMADl C-domain + 
XSMADl C-domain, G418S + 
XSMAD 1 C-domain del aa 424-466 
XSMADl N-terminal domain 
Mouse SMAD2 C-domain + 
Mouse SMAD5 C-domain + 
Lamin (pLAM) 

[0078] SIP7 partial protein encoded by [above describedl the above-described cDNA also 
interacts with XSMAD 1 , mouse SMAD2 and 5 C-domains in vitro as [analysed] analyzed by the GST 
[pull downl pull-down assay, as described above for SIPS. In this assay, N-terminally myc-tagged 
SIP7 protein was specifically pulled down by GST-XSMADl, 2 and 5 C-domains, but not by 
GSTalone or GST-CD40. A partial, but coding, nucleic acid sequence for SIP7 is depicted in SEQ 
ID NO: 8. 

General description of the methods used 
Plasmids and DNA manipulations 

[0079] Mouse SMADl and SMAD2 cDNAs used in this study were identified by low 
stringency screening of oligo-dT primed \Exlox cDNA library made fi-om 12 dpc mouse embryos 
(Novagen), using SMAD5 (ML? 1.2 clone as described in Meersseman et al., 1997, MeckDev,, 61, 
p. 127-140) as a probe. The same library was used to screen for full-size SIPl, and yielded lExTW6. 
The tw6 cDNA was 3.6 kb long[,] and overlapped with thl cDNA, but contained additional 3'- 
coding sequences including an in-fi-ame stop codon. Additional 5' sequences were obtained by 5' 
RACE using the Gibco-BRL 5' RACE kit. 

[0080] XSMADl fiiU-size and C-domain bait plasmids were constructed using previously 
described EcoRl-Xhol inserts_(Meersseman et al.,1997, Mech.Dev.,61, p.l27-140)[,] and cloned 
between the EcoBJ and Sail sites of the bait vector pGBT-9 (Clontech), such that in-frame fusions 
with GAL4dbd were obtained. Similar bait plasmids with mouse SMAD 1 , SMAD2 and SMAD5 were 
generated by amplifying the respective cDN A fi-agments encoding the C-domain using Pfu polymerase 
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(Stratagene) and primers with EcoRl and Xhol sites. The G4 1 8 S JfSMAD 1 C-domain was generated 
by oligonucleotide-directed mutagenesis (Biorad). 

[0081] To generate in-frame fusions of SMAD C-domains with GST, the same SMAD 
fragments were cloned in pGEX-5X-l (Pharmacia). The phage T7 promoter-based SIPl (THl) 
construct for use in the T7 W system was generated by partial restriction of the thl prey cDNA with 
Bglll, followed by restriction with Sail, such that SIPl (THl) was lifted out of the prey vector along 
with an in-frame translational start codon, an HA-epitope tag of the flu virus, and a stop codon. This 
fragment was cloned into pGEM-3Z (Promega) for use in the T7 W system. A similar strategy was 
used to clone SIP2 (thl2) into pGEM-3Z. 

[0082] PolyA"^ RNA from 12.5 dpc mouse embryos was obtained with OLIGOTEX-dT 
(Qiagen). Randomly primed cDNA was prepared using the SUPERSCRIPT CHOICE SYSTEM 
(Gibco-BRL). cDNA was ligated to an excess of Sfi double-stranded adaptors containing Siul and 
BamUl sites. To facilitate cloning of the cDNAs, the prey plasmid pAct (Clontech) was modified to 
generate pAct/Sfi-Sfi. Restriction of this plasmid with Sfi generates sticky ends which are not 
complementary, such that self-ligation of the vector is prevented upon cDNA cloning. A library 
containing 3.6 X 10^ independent recombinant clones with an average insert size of 1,100 bp was 
obtained. 

Synthesis of SIPl and GST pull-down experiments 

[0083] Expression of SIPl (THl) and SIP2 (TH12) in mammalian cells with the T7VV 
system and the preparation of the cell lysates were as described previously (Verschueren, et 
al.,1995,Mec/z.Z)ev.,52, p. 109-123). 

[0084] GST fusion proteins were expressed in E, coli (strain BL21) and purified on 
rRluthathione-Sepharosel glutathione-sepharose beads (Pharmacia). The beads were washed first four 
times with PBS supplemented with protease inhibitors, and then mixed with 50 ^1 of lysate (prepared 
from T7W-infected SIPl -expressing mammalian cells) in 1 ml of GST buffer (50 mM Tris-HCl pH 
7.5, 120 mM NaCl, 2 mM EDTA, 0. 1% (v/v) NP-40, and protease inhibitors). They were mixed at 
4°C for 16 hours. Unbound proteins were removed by washing the beads four times with GST 
buffer. Bound proteins were harvested by boiling in sample buffer[,] and resolved by SDS-PAGE. 
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Separated proteins were visualized using autoradiography or immunodetection after Western 
blotting[;] using anti-HA monoclonal antibody (12CA5) and alkaline phosphatase-conjugated anti- 
mouse 2ary antibody (Amersham). 

EMS A (electrophoretic mobility shift assay) 

[0085] The sequence of the kE2 WT and mutated kE2 oligonucleotides are identical as 
disclosed in Sekido et al[;], (1994, MoLCell[,]BioL , 14, p. 5692-5700). The sequence of the AREB6 
oligonucleotide was obtained from Ikeda et al[;]^(1995, EurJ.Biochem, 233, p. 73-82). IL2 
oligonucleotide is depicted in Williams et al[;L(1991, Science, 254, p. 1791-1794). 

[0086] The sequence of rBrachvurvl ^mc/?v7/rv binding site is 5'-TGACACCTAGG 
TGTGAATT-3' (SEQ ID NO: 27). The negative control GATA2 oligonucleotide sequences 
originated from the endothelin promoter (Dorfinan et al[;]^ 1992, J.Biol.Chem., 267, p. 1279-1285). 
[Double strandedl Double-stranded oligonucleotides were labelled with polynucleotide kinase and ^^P 
g-ATP and purified from a 15% polyacrylamide gel. Gel retardation assays were performed 
according to Sekido et al[;], (1994, Mol.CeIl[.].Biol.,14, p. 5692-5700). 

[0087] RESULTS OF [TWO HYBRIDI TWO-HYBRID SCREENING (XSMADl C- 
domain bait versus 12.5 dpc mouse embryo library; [600.0001 600,000 recombinant clones screened 
in4x 10Veasts[)]. 

[0088] SIP 1 - Three independent clones isolated (thl, th88 and th94) 

- Zinc-finger-homeodomain protein 

- Homology to [dEF-ll d-EFl (see above) 
- Interactions in yeast: 

XSMAD 1 C-domain bait -h 

Empty bait 

Lamin 

XSMADl [full lengthl full-length - 

XSMADl N-domain . - 

mSMADl C-domain + 



,4 




26 




mSMAD2 C-domain + 

mSMADS C"domain + 

XSMADl C-domain del 424-466 - 

XSMADl C-domain G418S + 



* Interaction with C-domain of XSMADl and mSMADs confirmed [in vitro]/?? 
vitro using [GST-pull downsl GST pull-downs and co-immunoprecipitations 

* Extended clone (TW6) isolated through library screening using 
thl sequences as a probe 

* C-terminal TW6 zinc-finger cluster binds to E2 box sequences (cfi- 
dEF-1), rBrachvurvl ffmc/iwrv T binding site. [Brachvurv1 ^mc/?vt/rv promoter 

sequences 

SIP2 (also called clone TH12)- Three independent clones isolated 
(thl2,th73,th93) 

Highly homologous to KIAA0150 gene product, isolated fi-om the 
myeloblast cell line KGl(Ref: Nagase et al. 1995; DMA Res 2 (4) 



167-174. 
Interactions in yeast: 

XSMAD 1 C-domain bait + 

Empty bait 

Lamin 

XSMAD 1 [fijU lengthl full-length + 
XSMADl N-domain ND 
mSMADl C-domain + 
mSMAD2 C-domain + 
mSMAD5 C-domain + 
XSMAD 1 C-domain del 424-466 
XSMAD 1 C-domain G4 1 8 S + 



TH60 - Two independent clones isolated (th60 and th77) 
- Zinc finger protein 
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homology to snail (transcriptional repressor) and to ATBFl 
(complex homeodomain zinc finger protein) 
- Interactions in yeast: 



XSMADl C-domainbait 



+ 



Empty bait 
Lamin 

TH72 - One clone isolated 

- Encodes a partial DPC-4 (SMAD4) cDNA (see above) 

- Interactions in yeast: 

XSMADl C-domain bait +++ 

Empty bait 

Lamin 

XSMADl [full lengthl full-leneth ND 
XSMADl N-domain 

mSMADl C-domain -h-h-h 
mSMAD2 C-domain ND 
mSMAD5 C-domain -H-+ 
XSMADl C-domain del 424-466 
XSMADl C-domain G418S + 

SIPS (also called clone th76). 

Analysis of interaction of the SIPS prey protein with different bait 

proteins (which are described in the data section obtained with SIP 1) in a yeast [two 

hvbrid] two-hvbrid assay can be summarized as follows 

Empty bait vector pGBT9 



[Full lengthl FuU-length XSMADl 



+ 



XSMADl C-domain 



+ 



XSMADl C-domain G418S 
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Mouse SMAD2 C-domain 



+ 



Mouse SMAD5 C-domain 



+ 



Lamin (pLAM; Clontech) 

SIP7 (also called clone th74) 

Analysis of the interaction of the SEP? prey protein with different bait 
proteins in a yeast [two hvbridl two-hvbrid assay can be summarized as follows: 
PGBT9 

[Full lengthl Full-length XSMADl 

XSMADl C-domain + 

XSMAD 1 C-domain, G4 1 8S + 

XSMAD 1 C-domain del aa 424-466 

XSMADl N-terminal domain 

Mouse SMAD2 C-domain + 
Mouse SMAD5 C-domain + 
Lamin (pLAM) 

The following clones have been investigated less extensively. They are considered as "true 
positives" because they interact with the XSMADl C-domain bait and not with the empty bait (/.e., 
GAL-4 DBD alone) 

TH75: -Three independent clones isolated (th75, th83, th89) 

-Partial aa sequences do not show significant homology to proteins in 
the public databases 



Interactions in yeast: 



XSMADl C-domain bait 



Empty bait 



TH92: -Zinc finger protein 
-homology to KUP 
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• # 

TH79, TH86, TH90[, Partial sequences do not display significant homology to any 
protein sequence in the public databases. 
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ABSTRACT OF THE DISCLOSURE 
The current invention concerns [SMAD interacting] SMAD-interacting protein(s) obtainable 
by a two-hybrid screening assay whereby SMADl C-domain fiised to GAL4 DNA-binding domain 
as "bait" and a cDNA library from mouse embryo as "prey" are used. Some characteristics of a 
specific [SMAD interactingi SMAD-interactine protein (SIPl) of the family of zinc 
finger/homeodomain proteins including d-crystallin enhancer binding protein and/or Drosophila zfh- 1 
include an inability to interact with [full sizel fiiU-size XSMADl in yeast, fSIPL^l SIPl^ binds to 
E2 box sites, [SIPL^I SIPlr^ ^p binds to the rBrachvurv] ^mc/?v?/rv protein binding site and interferes 
with [Brachvurv-mediated1 jSmc/?v?/rv-mediated transcription activation in cells and also interacts 
withjhe C"domain of SMAD 1, 2 and 5, The minimal length of the amino acid sequence necessary 
for binding with SMAD appears to be a 51 amino acid domain encompassing amino acids 166-216 
of SEQ ID NO: 2 having the amino acid sequence as depicted in the one letter code: 
QHLGVGMEAPLLGFPTMNSNLSEVQK\a.QIVDNTVSRQKMDCKT^^ (SEQ ID NO: 

21). 
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